Modeling dynamic prosodic variation for speaker verification
نویسندگان
چکیده
Statistics of frame-level pitch have recently been used in speaker recognition systems with good results [1, 2, 3]. Although they convey useful long-term information about a speaker’s distribution of f0 values, such statistics fail to capture information about local dynamics in intonation that characterize an individual’s speaking style. In this work, we take a first step toward capturing such suprasegmental patterns for automatic speaker verification. Specifically, we model the speaker’s f0 movements by fitting a piecewise linear model to the f0 track to obtain a stylized f0 contour. Parameters of the model are then used as statistical features for speaker verification. We report results on 1998 NIST speaker verification evaluation. Prosody modeling improves the verification performance of a cepstrum-based Gaussian mixture model system (as measured by a task-specific Bayes risk) by 10%.
منابع مشابه
Incorporating Prosodic with Acoustic information for ISCSLP’2006 Speaker Recognition Evaluation- Robust Cross-Channel Speaker Verification
In this paper, we present our speaker verification (SV) systems for the cross-channel text-independent and dependent speaker verification (TI-SV and TD-SV) tasks of ISCSLP’2006 speaker recognition evaluation (ISCSLP2006-SRE). To address the cross-channel issues and take advantage of the unique characteristics of Mandarin (i.e., tonal language), prosodic contours are modeled to assist the state-...
متن کاملA Review of Various Score Normalization Techniques for Speaker Identification System
This paper presents an overview of a state-of-the-art text-independent speaker verification system using score normalization. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Normalization of scores is then explai...
متن کاملiVector Fusion of Prosodic and Cepstral Features for Speaker Verification
In this paper we apply the promising iVector extraction technique followed by PLDA modeling to simple prosodic contour features. With this procedure we achieve results comparable to a system that models much more complex prosodic features using our recently proposed SMM-based iVector modeling technique. We then propose a combination of both prosodic iVectors by joint PLDA modeling that leads to...
متن کاملProsodic features for speaker verification
In this paper we study the effectiveness of prosodic features for speaker verification. We hypothesize that prosody is linked to linguistic units such as syllables and prosodic features can be better represented with reference to the syllabic sequence. For extracting prosodic features, speech is segmented into syllablelike regions using the knowledge of vowel onset points (VOP). We use a techni...
متن کاملDuration and pronunciation conditioned lexical modeling for speaker verification
We propose a method to improve speaker recognition lexical model performance using acoustic-prosodic information. More specifically, the lexical model is trained using durationand pronunciation-conditioned word N-grams, simultaneously modeling lexical information along with their acoustic and prosodic characteristics. Support vector machines are used for modeling and scoring, with N-gram freque...
متن کامل